Compaction Techniques for Nextword Indexes
نویسندگان
چکیده
Most queries to text search engines are ranked or Boolean. Phrase querying is a powerful technique for refining searches, but is expensive to implement on conventional indexes. In other work, a nextword index has been proposed as a structure specifically designed for phrase queries. Nextword indexes are, however, relatively large. In this paper we introduce new compaction techniques for nextword indexes. In contrast to most index compression schemes, these techniques are lossy, yet as we show allow full resolution of phrase queries without false match checking. We show experimentally that our novel techniques lead to significant savings in index size.
منابع مشابه
Efficient Phrase Querying with an Auxiliary Index
Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...
متن کاملEfficient Phrase Querying with an Auxiliary Index
Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...
متن کاملWhat's Next? Index Structures for Efficient Phrase Querying
Text retrieval systems are used to fetch documents from large text collections, using queries consisting of words and word sequences. A shortcoming of current systems is that word-sequence queries, also known as phrase queries, can be expensive to evaluate, particularly if they include common words. Another limitation is that some forms of querying are not supported; an example is phrase comple...
متن کاملOptimised Phrase Querying and Browsing of Large Text Databases
Most search systems for querying large document collections—for example, web search engines—are based on well-understood information retrieval principles. These systems are both efficient and effective in finding answers to many user information needs, expressed through informal ranked or structured Boolean queries. Phrase querying and browsing are additional techniques that can augment or repl...
متن کاملCompaction of Coarse-Textured Soils: Balance Models across Mineral and Organic Compositions
Soil bulk density (BD), degree of compactness (DC), maximum bulk density (MBD), and critical water content (CWC) at which MBD is reached are commonly used to characterize soil compaction, and can be predicted from soil texture and organic matter content, omitting other components such as sand sub-classes and soil cementing agents and potential biases such as data redundancy and sub-compositiona...
متن کامل